NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Time series predictions in unmonitored sites: a survey of machine learning techniques in water resources

https://doi.org/10.1017/eds.2024.14

Willard, Jared D; Varadharajan, Charuleka; Jia, Xiaowei; Kumar, Vipin (January 2025, Environmental Data Science)

Abstract Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
more » « less
Full Text Available
Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

https://doi.org/10.1137/1.9781611977653.ch73

Xu, Shaoming; Khandelwal, Ankush; Li, Xiang; Jia, Xiaowei; Liu, Licheng; Willard, Jared; Ghosh, Rahul; Cutler, Kelly; Steinbach, Michael; Duffy, Christopher; et al (April 2023, Proceedings of the 2023 SIAM International Conference on Data Mining (SDM))
Shekhar, Shashi; Zhou, Zhi-Hua; Chiang, Yao-Yi; Stiglic, Gregor (Ed.)
In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to minibatch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating the utility of our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.
more » « less
Full Text Available
Daily surface temperatures for 185,549 lakes in the conterminous United States estimated using deep learning (1980–2020)

https://doi.org/10.1002/lol2.10249

Willard, Jared D.; Read, Jordan S.; Topp, Simon; Hansen, Gretchen J.; Kumar, Vipin (August 2022, Limnology and Oceanography Letters)

Full Text Available
Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems

https://doi.org/10.1145/3514228

Willard, Jared; Jia, Xiaowei; Xu, Shaoming; Steinbach, Michael; Kumar, Vipin (January 2022, ACM Computing Surveys)

There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for which these approaches have been applied are summarized, and then classes of methodologies used to construct physics-guided ML models and hybrid physics-ML frameworks are described. We then provide a taxonomy of these existing techniques, which uncovers knowledge gaps and potential crossovers of methods between disciplines that can serve as ideas for future research.
more » « less
Full Text Available
Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles

https://doi.org/10.1145/3447814

Jia, Xiaowei; Willard, Jared; Karpatne, Anuj; Read, Jordan S.; Zwart, Jacob A.; Steinbach, Michael; Kumar, Vipin (May 2021, ACM/IMS Transactions on Data Science)
null (Ed.)
Physics-based models are often used to study engineering and environmental systems. The ability to model these systems is the key to achieving our future environmental sustainability and improving the quality of human life. This article focuses on simulating lake water temperature, which is critical for understanding the impact of changing climate on aquatic ecosystems and assisting in aquatic resource management decisions. General Lake Model (GLM) is a state-of-the-art physics-based model used for addressing such problems. However, like other physics-based models used for studying scientific and engineering systems, it has several well-known limitations due to simplified representations of the physical processes being modeled or challenges in selecting appropriate parameters. While state-of-the-art machine learning models can sometimes outperform physics-based models given ample amount of training data, they can produce results that are physically inconsistent. This article proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improves the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physics-based models (by over 20% even with very little training data), while generating outputs consistent with physical laws. An important aspect of our PGRNN approach lies in its ability to incorporate the knowledge encoded in physics-based models. This allows training the PGRNN model using very few true observed data while also ensuring high prediction accuracy. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where physics-based (also known as mechanistic) models are used.
more » « less
Full Text Available
Can machine learning accelerate process understanding and decision‐relevant predictions of river water quality?

https://doi.org/10.1002/hyp.14565

Varadharajan, Charuleka; Appling, Alison P.; Arora, Bhavna; Christianson, Danielle S.; Hendrix, Valerie C.; Kumar, Vipin; Lima, Aranildo R.; Müller, Juliane; Oliver, Samantha; Ombadi, Mohammed; et al (April 2022, Hydrological Processes)

Full Text Available
Process‐Guided Deep Learning Predictions of Lake Water Temperature

https://doi.org/10.1029/2019WR024922

Read, Jordan S.; Jia, Xiaowei; Willard, Jared; Appling, Alison P.; Zwart, Jacob A.; Oliver, Samantha K.; Karpatne, Anuj; Hansen, Gretchen J.; Hanson, Paul C.; Watkins, William; et al (November 2019, Water Resources Research)

Full Text Available
Predicting Water Temperature Dynamics of Unmonitored Lakes With Meta‐Transfer Learning

https://doi.org/10.1029/2021WR029579

Willard, Jared D.; Read, Jordan S.; Appling, Alison P.; Oliver, Samantha K.; Jia, Xiaowei; Kumar, Vipin (June 2021, Water Resources Research)

Abstract Most environmental data come from a minority of well‐monitored sites. An ongoing challenge in the environmental sciences is transferring knowledge from monitored sites to unmonitored sites. Here, we demonstrate a novel transfer‐learning framework that accurately predicts depth‐specific temperature in unmonitored lakes (targets) by borrowing models from well‐monitored lakes (sources). This method, meta‐transfer learning (MTL), builds a meta‐learning model to predict transfer performance from candidate source models to targets using lake attributes and candidates' past performance. We constructed source models at 145 well‐monitored lakes using calibrated process‐based (PB) modeling and a recently developed approach called process‐guided deep learning (PGDL). We applied MTL to either PB or PGDL source models (PB‐MTL or PGDL‐MTL, respectively) to predict temperatures in 305 target lakes treated as unmonitored in the Upper Midwestern United States. We show significantly improved performance relative to the uncalibrated PB General Lake Model, where the median root mean squared error (RMSE) for the target lakes is 2.52°C. PB‐MTL yielded a median RMSE of 2.43°C; PGDL‐MTL yielded 2.16°C; and a PGDL‐MTL ensemble of nine sources per target yielded 1.88°C. For sparsely monitored target lakes, PGDL‐MTL often outperformed PGDL models trained on the target lakes themselves. Differences in maximum depth between the source and target were consistently the most important predictors. Our approach readily scales to thousands of lakes in the Midwestern United States, demonstrating that MTL with meaningful predictor variables and high‐quality source models is a promising approach for many kinds of unmonitored systems and environmental variables.
more » « less

Search for: All records